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Abstract 

Scott  (1985a,  1985b)  has  recently  studied  two  simple  variations  on  the  ordinary  histogram, 
namely  the  frequency  polygon  and  the  average  shifted  histogr2mi,  and  found  that  they  are  able 
to  compete  with  for  example  kernel  density  estimators  in  performance  while  retaining  the 
advantage  of  being  conceptually  and  computationally  simple.  The  present  paper  proposes 
a  way  of  generalizing  frequency  polygons  to  d-dimensional  space  that  perforins  better  than 
Scott’s  generzdization.  Expressions  for  integrated  mean  squared  error  and  for  integrated  mean 
absolute  deviation  plus  integrated  absolute  bias  are  obtained  for  generalized  frequency  poly¬ 
gons,  for  average  shifted  histograms,  and  for  generalized  frequency  polygons  of  average  shifted 
histograms.  These  expressions  are  used  to  give  guidelines  for  window  sizes. 

Key  words  and  phrases:  frequency  polygons,  average  shifted  histograms,  multi-dimensional, 
integrated  mean  squzured  error,  integrated  meein  absolute  deviation,  integrated  absolute  bias. 
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1.  Introduction. 


A  simple  way  in  which  to  smooth  a  (univariate)  histogram  is  to  connect  midbin  values 
with  straight  lines.  This  is  the  frequency  polygon,  which  of  course  has  been  used  for  display 
purposes  at  least  since  1900.  It  was  not  demonstrated  until  recently,  however,  that  the  gain  of 
this  simple  linear  smoothing  is  substantial,  and  that  the  frequency  polygon  comes  a  long  way 
towards  matching  more  sophisticated  density  estimators,  while  at  the  same  time  retaining  the 
advantage  of  being  conceptually  eind  computationally  simple;  see  Scott  (1985a,  1985b). 

It  is  not  obvious  how  the  notion  of  a  frequency  polygon  should  be  extended  to  two 
and  higher  dimensions.  Scott  (1985a)  gives  one  possible  definition  for  the  bivariate  case, 
but  a  general  d-dimensional  definition  along  his  lines  would  be  awkward,  and  expressions  for 
integrated  mean  squared  error  (IMSE),  the  traditional  criterion  by  which  to  judge  density 
estimators,  would  be  very  difficult  to  obtain.  In  Section  2  we  discuss  a  natural  extension 
termed  the  generalized  frequency  polygon,  obtain  the  IMSE,  and  show  that  it  performs  better 
than  Scott’s  version.  We  are  also  able  to  obtain  an  expression  for  another  natural  criterion, 
the  integrated  mean  absolute  deviation  plus  integrated  absolute  bias  (IMAD  +  LAB),  in  the 
general  d-dimensional  case.  These  expressions  provide  guidelines  for  the  choice  of  binwidths, 
and  are  informative  for  purposes  of  comparison  with  other  density  estimators. 

Another  neat  construction  of  Scott  (1985b)  is  the  average  shifted  histogram,  which  shares 
with  the  frequency  polygon  the  virtues  of  matching  (for  example)  kernel  estimators  in  per¬ 
formance  while  still  being  computationally  more  feasible  when  faced  with  the  problem  of 
evaluating  the  estimator  many  times  from  a  large  data  set,  which,  for  example,  is  the  task 
of  classification  procedures  built  on  nonparametric  density  estimation.  Scott  (1985b)  obtains 
IMSE  expressions  for  dimensions  d  =  1,2.  His  results  are  supplemented  in  Section  3  with 
d-dimensional  results  for  both  IMSE  and  IMAD  +  LAB. 

It  is  only  natural  to  try  out  the  two  tricks  mentioned  above  in  tandem,  and  define  the 
frequency  polygon  of  the  average  shifted  histogram.  Again,  Scott  (1985b)  has  IMSE  expressions 
for  the  uni-  and  bivariate  case,  but  notes  that  explicit  multivariate  results  are  not  generally 
available.  Section  4  studies  generalized  frequency  polygons  of  average  shifted  histograims,  and 
once  more  expressions  for  IMSE  and  for  IMAD  +  lAB  are  obtained. 
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Some  consequences  of  these  results  are  briefly  discussed  in  Section  5,  and  comparisons 
with  kernel  density  estimates  are  made. 

Scott  and  others  have  emphasized  the  use  of  these  histogram-type  density  estimates  for 
display  purposes,  even  in  three  and  four  dimensions  (!),  see  for  example  Scott  and  Thompson 
(1983).  Apart  from  general  theoretical  interest,  the  present  work  has  been  motivated  by  the 
possibility  of  using  say  generalized  frequency  polygons  of  average  shifted  histograms  as  building 
blocks  in  cl2issifiers  in  symbol  recognition  and  reconstruction  of  remotely  sensed  images.  Typi¬ 
cal  cheuracteristics  of  these  technological  problems  are  large  training  sets  and  high-dimensional 
feature  vectors.  The  optimal  classification  rule  depends  upon  a  posteriori  probabilities  that 
again  are  expressed  in  terms  of  class  densities,  and  a  natural  way  to  proceed  is  to  estimate 
these  nonparametrically.  Classifiers  built  along  these  lines  may  use  density  estimators  as  ^black 
boxes” ,  and  the  need  to  display  and  scrutinize  aspects  of  the  data  is  secondary.  These  remarks 
also  provide  the  motivation  for  having  available  a  general  d-dimensional  theory. 

To  get  started,  let  Xi,  •  -  • ,  Xn  be  a  sample  of  independent  observations  from  an  unknown 
smooth  density  /  in  A  conunon  point  of  departure  for  later  refinements  will  be  a  standard 

A 

histogram  density  estimate  /o  defined  on  a  grid  of  cells  with  centres  Xk  and  volume  hi  •  •  • 
i.e. 

foix)  =  (Ai  •  •  •  h^)-^Yo{k)/n,  X  €  Io{k)  (1.1) 

where  ^ 

loik)  =  JJ  (xk,j  -  Xkj  +  (1.2) 

is  cell  number  k,  amd  Yo{k)  is  the  number  of  Xi’s  falling  in  this  cell. 

Although  exact  expressions  sometimes  result  from  the  reasoning  in  what  follows,  let 
us  make  clear  that  our  analysis  mainly  is  a  large  sample  one,  where  hi  — »  0,  •  •  • ,  »  0, 

nhi  •'•hd  —*  oo  as  the  number  of  observations  tends  to  infinity.  These  requirements  are  staui- 
dard  and  ensure  that  /o  above  is  consistent  ^ 

We  shadl  not  try  to  be  as  general  possible  amd  shall  be  content  to  derive  results  for 
densities  /  with  support  equal  to  the  union  of  the  histogram  cells  and  with  continuous  deriva¬ 
tives  fj{x)  =  fjtix)  =  ht»  (®)  =  aijfeair/C®)  S’^st,  second,  and  third 

order. 
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2.  Generalized  Frequency  Polygons. 

Consider  the  histogram  defined  in  (1.1),  (1-2).  To  generalize  the  notion  of  a  frequency 
polygon,  we  should  for  a  given  x  linearly  combine  nearby  values  of  /o  and  hopefully  get  an 
improved,  smoothed  version.  Fix  a  particular  A:,  and  shift  attention  to  the  new  cell 

^  ^  i  i 

^(*) = ~  2^’  ^  2^*^’ 

«=i  »=i 

where  a  =  xjfe  +  |hisin  the  middle  of  2*^  histogram  ceils. 


■«-  X,  .  +  h. 

k.j  j 

*<“  a . 
j 

x.  . 

k,J 


^ ^ 

Figure  1.  The  GFP  cell  I{k)  lies  within  2**  histogram  cells  Io{k]ji,  •  •  • ,  jd)- 

We  want  to  define  a  generalized  frequency  polygon  (GFP)  in  this  “iimer  cell”  by  smoothing 
the  2^  histogram  values  /o(®fc,i  +iihi,  •  •  • ,  Xk,d+jdhd)  =  fo{xk+jh),  j  =  {ji,  (0, 1}“*. 

Write 

/(®)  =  xel{k)  (2.2) 
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where  Yb(^;ii>  •  •’,jd)  =  ^o(fc;i)  is  the  number  of  J?,’s  falling  in 


1  1 

Io{k]j)  =  +  (jt  ~  ^k,i  +  {ji  +  2^^’^ 

’■=;  ^  (2.3) 

=  JJ(ai  +  [ji  -  l)h,-,  Oi+jihi]. 

»=i 

The  2^  cy-functions  appearing  in  (2.2)  2u*e  to  be  specified  later.  Natural  immediate  requirements 
are  Cj{x)  >  0,  SjCy(x)  =  1,  making  sure  that  /  is  a  density  in  2R^.  The  ordinary  frequency 
polygon  is  of  the  form  (2.2),  with  d  =  1, 


co(x)  =  1  -  u(x),  ci(x)  =  u(x); 
u(x)  =  (x  -  Xk)/h,  X  e  [xikjXib  +  /i]  =  [a  - 


(2.4) 


2.1.  Bias  of  the  GFP. 

One  might  consider  estimates  of  the  of  the  form  (2.2)  with  weights  cy(x)  determined  by 
the  data;  the  discussion  in  the  following  is  however  limited  to  the  case  of  non-random  cy(x) 
functions. 

The  exEict  expectation  of  the  GFP  is 

S ■  ■  ■  f^d)~^poU),  X  e  I{k) 

3 


where 

Po(j)=  /  fdx. 


(2.5) 


Approximations  to  Ef{x)  can  now  be  worked  out,  for  example  based  on  Taylor  expansions 
2uround  x.  It  serves  our  present  purpose  best,  however,  to  expand  around  the  point  a  =  xjt+ 
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We  get 


Po(i)/ (^1  •  •  •  M  =  /  (/(“)  +  fi{a){xi  -  Oi) 

+  -  fu{a){xi  -  ai){xt  -  a^)  H - |cix/(/ii  •••hd) 

*1^  ^  /  X 

.  1  ^  1 

=  /(«) + E/i  + E  /«(■>)§'>? 

«=1  «=1 

ijit 

where  =  here  and  below  is  used  after  shaving  off  higher  order  terms.  This  makes  exposition 
easier;  regularity  conditions  zire  discussed  later.  Hence 


=  E»jw{/w  f 

3  ^  i—1  *=1 

ijit  J 

=  m + E/i  E(-0’‘*'«;  w 

i=l  j 

+ E  /aWs*? + E 


'g  »  '  /Q 

»=1  ijit  i 


It  follows  that 


bias(x)  =  Ef{x)  —  f{x) 
d 


=E/i(»){i'«E(-i)’‘''Sw  »<)} 

1=1  ^  j  ' 


1=1 


+  E -  ^{xi  -  ai){xt  -  ai)|. 


(2.7) 


Expressions  for  integrated  squared  bias  ^tnd  integrated  absolute  bias  in  terms  of  any  given 
set  of  Cj-functions  can  now  be  obtained,  but  we  will  refrain  from  doing  so  until  the  best  choice 
of  Cj-functions  has  been  settled  on. 
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2.2.  Variance  of  the  GFP. 


Using  multinomial  moments  we  get  from  (2.4) 

Var{/(i)}  =  {hi  •  •  2^c,(x)^^po(i){l  -  Po{j)} 

y 

-{hi---  hd)~^  Cj{x)cj,{x)^po{j)po{j') 

=  {nhi  ■••hd)~^Y^ <^A^Ypo{3)I (^1  ‘  •  ^d) 

3 

(2-8) 

-  - 1  ^  Cj{x)po{j)/  {hi---  hd)  I 
^3  ^ 

=  {nhi  -  -  -  hd)~^  f{a)'^Ci{x)^  -  ^/(o)^ 

n 

3 

»=1  J 

This  approximation  holds  for  x  in  I{k)  =  (a  —  \h^  a  +  |h]. 

2.3.  The  right  choice:  The  linear  blend  interpolator. 

Introduce 

Ui  =  Ui{x)  =  (xj  -  Xk,i)/hi,  Xi  €  [xfc,,-,  Xk,i  +  ht]  =  [<*<-  o,-  +  ^h,],  (2.9) 

t  =  1,  •  •  • ,  d.  Ui  goes  linearly  from  0  to  1  2is  x,-  moves  from  the  left  to  the  right  side  of  the  »th 
side  of  the  cell  I{k),  cf.  (2.1).  There  is  no  loss  of  generzdity  in  representing  the  Cj{x)  functions 
in  terms  of  ui,  ---,  Ud. 

Now  turn  to  the  choice  of  the  2^  Cj{x)  functions.  In  addition  to  Cj  (x)  >  0  and  Cj  (®)  =  1 
we  should  impose  /  =  /o  at  the  2*^  corners  of  I{k),  i.e.  Cji,...jj(x)  =  1  when  (ui,“-,U(i)  = 
(ji>  ■  ■  ■  >  J<i)>  /  is  plain  average  of  the  2*^  nearby  corner  values  at  the  centre  point  a, 

and  perhaps  some  symmetry.  In  some  sense  we  want  Cj{x)  to  measure  closeness  of  x  to  corner 
3- 

Our  choice  is 

«i., =  (1  -  “i)  ‘■’■“i*  •  •  ■  (1  -  (2-10) 

These  functions  satisfy  the  requirements  above.  More  importantly,  the  algebraic  expressions 
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(2.7),  (2.8)  for  bias  and  variance  simplify  dramatically,  and  IMSE  (strictly  speaking,  the  leading 
terms  of  the  IMSE)  with  any  other  choice  of  cy-functions  will  be  larger  than  EMSE  with  (2.10). 

Now  expressions  for 


IMSE  =E  j{f{x)- f{x)ydx  =  j  Var{/(i)}cfx  +  J {hia.8{x)ydx  (2.11) 


can  be  worked  out.  It  is  a  matter  of  checking  to  arrive  at 


Xi  -  Oi 


ii-Jd 

Yi  =  (2u,  -  l)(2u,  -  1)  =  4?' 


hi  hi 


3i  t*‘*t3d 


d 

E 

»=1 


Hence  from  (2.7) 


bias(i)  =  -  ^{xi  -  a,)^} 


(2.12) 

(2.13) 

(2.14) 

(2.15) 


and 


f  {bias(x)}2£la;  =  5^{/«(a)}2  f  {i/i?  -  i(i,-  -  Oif^dx 
Jl{k)  ^  Jl{k)  6  2 

+  ^ /«(«)/«(«)  ^  -  atY}dx 


hi  •  •  •hd^ 


»=1 

Summing  over  all  cells  and  approximating  integrals  with  Riemann  sums  in  the  usual  way  we 
arrive  at 

[ {bias(x)}2<ix  =  ^2^^i  / 

t=l 

+  /  fii  fttdx. 

%<t 

Also,  combining  (2.8)  and  (2.14) 

f  Yax{f{x)}dx=  {nhi--’hd,)~^f{a){^)^hi-’-hd-  -/(a)^^i  •  •  ■  Ad 
Jl(^k)  3  n 


(2.16) 
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so  that 

J  Var{/(x)}da:  =  (|)‘'(nhi  •  •  •  “  n  / 

Hence  an  asymptotic  IMSE  approximation  expression  has  been  obtained;  see  Theorem  1  below. 

Remark  1.  The  choice  (2.10)  was  arrived  at  by  the  writer  as  a  solution  to  the  equations 
=  2{xi  -  ai)/hi,  =  4(x,*  -  ai){xt  -  ae)/hihi,  i  £,  aiming 

at  making  the  bias  (2.7)  have  as  small  order  as  possible.  He  has  later  learned  that  in  general, 
approximating  a  function  in  a  rectangle  by  linear  interpolation  of  its  comer  V2dues,  with  weights 
as  in  (2.9),  (2.10),  is  known  in  numerical  analyst  circles  as  linear  blend  interpolation.  The  GFP 
we  propose  is  accordingly  the  linear  blend  of  the  ordinary  histogram.  (Note  that  the  weights 
themselves  are  not  linear  in  x.) 

2.4.  IMAP  and  lAB  for  the  GFP. 

We  can  in  awidition  to  IMSE  study  a  criterion  related  to  the  Li  distance  /  |  /  —  /|dx,  which 
in  some  ways  is  a  more  naturad  meaisure,  see  Devroye  and  Gyorfi  (1985).  The  expected  Li 
distance  itself  proves  to  be  rather  intractable  mathematically,  so  we  shadl  be  content  to  study 
the  naturad  amd  statistically  meaningful  upper  bound 

E  J {!/»  -  Efix)\  +  \Ef{x)  -  fix)\}dx 

=  y*[mad{/(x)}  +  |bias(x)|](ix  (2-18) 

=  IMAD  +  lAB, 

writing  mad{/(x)}  =  E\f{x)  —  E/(x)|  for  the  mean  absolute  deviation  and  lbiau3(x)|  for  the 
absolute  biaus.  Note  the  similarity  of  (2.18)  to  the  traditionad  criterion  (2.11). 

Sometimes  we  shall  take  interest  in  IMAD  and  LAB  evaluated  over  some  bounded  region 
instead  of  over  all  of  IR^. 

We  shall  in  fact  sometimes  only  give  upper  bounds  for  /  lbias(x)ldx,  since  exau:t  calcular 
tions  tend  to  be  difficult  (but  possible,  as  opposed  to  the  exact  expected  Li  distance,  which 
borders  on  the  impossible),  and  since  the  resulting  expressions  do  not  convey  aus  useful  infor¬ 
mation  as  the  upper  bounds.  For  illustration  of  this  point,  consider  |  ^t(®*  “  a,)|dx. 

This  integral  may  be  explicitly  evaluated  in  terms  of  6i,  •  •  • ,  bd  and  the  widths  hi,  •  ■  • ,  but 
the  amswer  is  less  informative  and  useful  than  the  simple  upper  bound  •  •  ‘h^). 
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Now  consider  IMAD  and  lAB  for  the  chosen  GFP*  From  (2.15)  we  get 


f  lbias(a;)|da;  =  /  |  ^  -  a,)2}|dx 

Jl{k)  Jl{k)  6  2 

^  f  .  ,  -  OiYyda 

j*_l  J[a— a+^k]  ^  2 


(2.19) 


«=1 


Furthermore, 

/(x)  -  Ef{x)  =  J^Cj(i)(^i  •  ••hd)~'^{-Yo{k;j)  -  po{j)} 

3 


=  /(o)^/^(n/ii  •  •  -  Ad)  X^cy(x)iVo(i) 


PoUni  -  Po{j)}V^^ 


L  J 


writing 


_  nr  /-•  ^  _  ^o(fc;i)  -  npo(i) 

NoU)  -  miu -.M)-  i^y){ilp,y)„iA' 


(2.20) 


These  variables  are  asymptotically  independent  and  standard  normally  distributed  by  a  trianr 
gular  and  multiveiriate  version  of  the  Lindeberg  theorem,  eis  long  eis  nhi  •••hd  oo.  (Even 
though  Po{j)  — ♦  0  by  (2.6)  one  still  has  npo{j)  — »  oo.)  They  are  also  uniformly  integrable, 
since  ENo(jy  =  1.  Hence 


y  y  y 

This  suggests 

f  mad{/(x)}dx  =  f{a)^^^{nhi  •  •  •  f  cy(x)^}^/^dx 

Juk)  ^  Jnk)  Y 

=  •  •  ■hd)“^/^(-)^/*  JJ  /  [{1  -  tx,(x)}^  +  Ui{xYYl^dxi 

=  !{a)y\nk,-h,)-^l\l)y^  I*  +  ‘>21(1^1%, ...A, 

and  a  corresponding  IMAD  expression. 
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For  fixed  n, hi,  '",hd  there  will  always  be  many  cells  left  with  npo(j)  ==  /(a)  nhi  ■  •  •  hj 
small,  even  if  nhi  •••hi  is  large,  so  we  cannot  expect  for  example  jE|iVo{y)l  =  to  be  a 

good  approximation  for  all  cells.  (In  fact,  about  the  best  one  can  get  is  |^;|iVo(j)|  —  < 

^/[”Po(j){l  ~  Po(y)}]^^^  for  some  constant  B\  see  the  Appendix.)  The  unavailability  of  a 
closed  form  expression  for  E\f{x)  —  Ef[x)\,  therefore,  will  lead  to  good  approximations  for 
IMAD  only  if  /  has  bounded  support,  or  if  mad{/(®)}  is  integrated  over  a  bounded  region 
only. 


Theorem  1.  Let  the  density  /  in  have  continuous  derivatives  /,-,  fij,  fijk,  and  let 
hi  — >  0,  •  •  • ,  — >  0,  but  nhi  •  •  •hj  —*  oo.  Then  for  the  generalized  frequency  polygon  defined 
in  (2.2),  (2.10): 


IMSE  =  J  Var{/(x)}d®  +  j {hia3{x)}^dx 

=  (|)'‘(»Ai  j  (/«)’* 

«=1 

+  S  •  •  •  M); 

i<l  ^  •'  t=l  t=l 

IMAD  +  lAB  =  J  mad{f{x)}dx  +  J  |bias(x)|dx 

<  log(l  +  j  (nhi  •  •  •  hd)“^/* 

+  ^^h?  j  |/i,ldx  +  0((nhi---hd)“^  +  ^h?(nhi---hi)“^/*  +  ^h?). 

«=i  °  ■'  t=i  t=i 

The  IMSE  expression  needs  to  have  /  dx,  / {fi)^dx,  J {fij)^dx,  / {fijf^^dx  finite.  The 
IMAD  +  lAB  expression  holds  provided  the  integrals  are  evaluated  over  some  fixed  bounded 
region  contained  in  the  interior  of  the  support  of  /. 


Some  details  pertaining  to  the  proof  of  this  theorem  are  given  in  the  Appendix. 


Remark  2.  The  IMSE  expression  here  is  better  than  the  one  obtained  by  Scott  (1985a, 
equation  (7.1))  for  the  case  d  =  2,  since  he  used  a  less  efficient  choice  of  functions  co,o(x), 
co,i(ic),  ci^o(a^)j.ci^i(x)  than  the  linear  blend  weights  (2.11). 

Remark  3.  H  the  support  of  /  is  IR^  then  the  EMAD  +  LAB  expression  holds  over  each 
bounded  region.  (Actually,  taking  resort  to  bounded  regions  is  only  necessary  for  IMAD, 
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not  for  lAB.)  If  /  has  compact  support  (but  possibly  defined  only  on  its  interior)  then  the 
expression  holds  provided  only  that  the  functions  \fifj\/ and  \  'fij\l  have  finite  integrals. 

3.  Average  Shifted  Histograms 

Consider  again  a  histogram  density  estimate  /o  of  the  form  (2.1)  in  Choose  integers 
mi ,  •  •  • ,  rrid  and  consider  the  smaller  binwidths 


Sj  =  hj/m^,  j  =  (3.1) 

new  histograms  can  be  constructed  by  moving  the  grid  of  cells  an  amount  ijSj, 
i  =  1,  •  •  • ,  d;  =  0, 1,  •  •  • ,  rrij  —  1.  Scott  (1985a)  proposes  taking  the  average  of  these  shifted 
histograms,  i.e. 

mi-l  m<(— 1 

/a.Sh(®)  ~  mi-’-md  ^  ^  4, shifted 

/a.SH  constant  on  each  of  the  many  smaller  cells  of  volume  5i  •  •  •  Sd.  Single  out  one  of  these, 
say  Co  =  (xo  -  |5,  xq  +  |5]  =  ny=i(®o.j  -  |^i,  xqj  +  |5y],  and  write  Y'(i)  =  ^(ti,-  •  -  ,1^)  for 
the  number  of  Xy’s  that  fall  in  the  cell 

C^o(*)  =  (xo  +  (*  -  Xo  +  (*  +  -)5] 

-A-  1  1  (3-2) 

=  n(a^o,j  +  (»V  -  xo,j  +  («V  +  2)^j]- 

Then 

/aShW  =  Z)  •”  E  ^(‘1. •••>*<<)/"  (3-3) 

«i=l— mi  ii=l-md  '  '  ' 

for  X  in  the  particular  cell  Cq;  see  Scott  (1985b,  Section  2). 

Scott  (op.  cit.)  finds  IMSE  expressions  for  d  =  1, 2,  but  notes  that  explicit  multivariate 
IMSE  results  au:e  not  generally  available.  His  results  are  complemented  below  with  such  d- 
dimensional  results.  The  present  treatment  will  differ  only  mildly  from  his. 

3.1.  Bias  of  the  ASH. 

Write 

w{i)  =  w{ii, . . . ,  ,d)  =  ^1  -  ...  ^1  -  (3.4) 
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and  let  p{i)  =  p(»i,  •  •  \,id)  =  Ico(i)  from  (3.3) 

1  —  my  <  ly  <  m  y  —  1 

i.e.  the  exact  expectation  involves  /-probabilities  for  (2mi  —  1)  •  ‘•(2mei  —  1)  cells.  They  are 
all  contained  in  a  cell  with  volume  2h\'  •  '2hd  around  xq,  however,  so  approximations  to 
based  on  Taylor  expansion  of  /  around  xq  may  still  be  accurate  enough.  We 
get 


p{*i,”',*d)/{Si”’Sd)=  f  f{x)dx/{Si-“Sd) 

(®0+(»— §)^>*0+(»+5)f] 

=  /(®o)  +  X)  +  \^  +  ^)s] 


i=i 

d 


+  1 5^  f}iixo)*3USj5t  +  45^  fjjj  (®o)(*’  + 

j=i 


12' 


+  g  E  ht> 


distinct 


(3.5) 


Notice  for  the  following  that  u;(t)/(mi  •  •  ’m^)  =  11^=1  ~  defines  a  probability 

distribution  for  (t'l,  •  •  • ,  ij)  over  nyz=i{l  ~  ' ' '  >  Oj ' ' '  j  ~  1})  with  t’l,  •  •  ■ ,  id  independent, 

with  odd  moments  equal  to  zero,  and  with  £{iiY  —  |(”^y  “  !)•  Using  this  we  obtain,  for 

X  G  C/Q, 


-®/ash(®)  =  X) 

*1  i***i*<£ 


«^(0  p(«) 

mi  •  •  •  md  Si"^Sd 


(3.6) 
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After  subtracting  a  simileir  expression  for  f{x)  we  obtain  an  approximation  for  the  bias: 


i=i 


Hence 


+  Y.  -  ^{xj  -  xojf} 


3=1 


I Y  -  XQ,j){xt  -  Xo,t) 


fi  ^  V  f  jis  (^o)(3Jy  xoj)(x^  2:0,5). 


j,e,a 


{bias(x)}2  =  |2/i(xo)(xy  -  xqj) 


r 


+ 


T  2 


E4(»o){(m|  -  i)i«?  -  |(x,  -  xoj)^ 


i=i 


-  2  <  53  /j  (®o)(®y  -  a:o.y) 


+  < 


d 


YfiiMim^j  -  l)^Sj  -  \{xj  -  xo,jf} 
3=1 


T12 


53/i(®o)(a:y  -  xo,y)  >  <  53  “  ®0,y)(a:<  -  io.<) 

Ly=i  J 

^/yy(a^o){(m5  -  \)^S]  -  i(®y  -  xo.y)^} 

i=i 


( 53  /y<(a:o)(xy  -  xqj){xi  -  xo,^) 


J 


53  /y(®o)(®y  ~  ®o.y) 

Ly=i 


^  /y<»  (®o)(a:y  -  xoj){xt  -  xo,t)ix,  -  xq.,) 


1 

-3< 


'  ”  1  1  1  f  •••  1 

^/yy(®o)(m?  -  2)12^/  (  1  ?  (®o)(®y  “  ®o.y)(®^  ~  xo,i){x,  -  xo.,)  > , 

J=1  J  [3,1,0  J 


shaving  off  higher  order  terms. 


(3.7) 
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Next  integrate  this  over  Cq  =  (xq  —  xq  +  jS].  Many  details  later  one  gets 


[  {bias(x)}2da;  = 

Jco  12 

+  E  -  (m?  -  ^)  +  ^} 


i=i 


+  2fc(»o)/a(io)^«/«((>n5  -  l)(m|  -  1) 


+  I E  +  E 


1 
2 


j=i 


+EA(*o)7i«(x„)i«?«? 


5i  •  •  -  5(i. 


Summing  up  over  all  cells  zmd  using  Sj  =  hjfmj  we  arrive  at 


l^{bMx)y  <fe = E^^  JUifdx 


j=i 


144 


144^^^  2mJ^‘'‘  320m 


^}/(/«)Vx+ii/47, 


jij 


+ 


f^'hu  ^ 


(3.8) 


+ 


Somewhat  hidden  here  is  the  surprising  fact  that  ^5?  $3*  ‘  where  the  sum  is  over 

all  cells,  does  not  contribute  to  the  S^Sj  or  Sj  terms,  in  fact  ••'Sd  =  J {fj^dx  + 

0(^1  H - h  Sj),  see  the  Appendix. 


(3.8)  cein  be  simplified  further  if  /  is  zissumed  to  go  smoothly  to  zero  at  infinity  so  that 
/  gl7{/i W/iy(®)}<^®  =  0  /  /y  /yyy  dx=  -  fifij^dx),  f  ^^{f}{x)f)t{x)}dx  =  O  (then 

If)  fju  I  ^tO}){!^)h{x)}dx  =  0  (then  f  fjjfudx  =  /(fjtydx).  In  this 
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case 


j— 1  3 

J=1  3  3 


(3.9) 
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3.2,  Variance  of  the  ASH. 

From  (3.3),  (3.4),  (3.5)  and  using  multinomial  moment  formulae  one  gets 
Var{/ASH(®)}  =  (^i '  ’ '  “  P(*)} 

*1  r*  'i*(i 


(»1»  ■■»«<£) 


=  /‘‘I  --^E  — 

mi  •  •  •  rrid  oi  •  ■  •  n  rni  •  •  • 


«^(0  p(0 


mi  •  •  •  TTld  Si*  ••  Sd 


{nhi  ■•■hd.)  ^ ^  rni-^*^md  H  {}  ~  +  E 


j=i 


=  (n/ii  ■■•hd)  ^  n  f ^  ^  -  \)Sj 

j=l  \  3  J  [  J=1 


2 

>  . 


This  leads  to 

Var{/ASH(®)}‘^®  =  ("^1 '  ‘  n  ^  2^^  /(®o)^i  •  •  -  ^./(®o)*^i  •  • 

(3.10) 

and,  combined  the  the  conclusion  of  the  preceding  subsection,  the  IMSE  expression  reported 
in  Theorem  2  below. 

3.3.  IMAP  for  the  ASH. 

We  look  for  a  suitable  expression  for  IMAD  =  /  ■®|/a.SH(®)  ~  ■®/aSH(®)I^®»  again 

A 

start  out  trying  to  evaluate  the  function  over  the  single  cell  Cq,  on  which  /asH  constant 

value  {hi‘ •  •hd)~^J2i^{*)Y{i)/n,  cf.  (3.2)-(3.4).  So 

/aSh(®)  “  •®/aSH(®)  =  (^1  ■  * “ p(0} 

t 

(h  h  1-1 V  mrn  ~  “  p(*)}]^/^  (3.11) 

=  n-i/^(5i  •  •  •Sd)~^^^Zo^n, 
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where 

=  E  zr^---  Jy(01pW{i  - 

and 

N{i)  =  {y(0  -  np(.-)}/|np(.)  {1  -  p(0}l‘/>. 

It  follows  that 


f  ^I/ashW  -  ^/ashWI'^®  =  ("«1  •  -Sar^/^E\Zo,n\. 

JCo 


(3.12) 


(3.13) 


(3.14) 


Now  Zo,n  has  vari2«ice 

»(•? 


?  -  .wx*.  -  E  -s.) 

=  ;^Ed^p(o- 


(■>(■') 

Si*  •  *5(1  ^(0^ 

1  • . . 

=  /(xo){«.-«’^— E-^ 

mi  •  •  -mj  “  mi  • 


mi  •  •  -m^i 

(*r 


fixons,---s^f 


using  (3.5)  once  more.  np(t)  ==  /(xo)n5i ***64—^00  under  the  standard  zissumptions.  Hence 
N{t)  is  asymptotically  standard  normal,  and  the  YYj=i{^^i  ”  1)  -N’(t)*s  that  axe  involved  in 
Zo,n  are  asymptotically  independent,  as  the  varieince  computation  above  also  indicates.  Hence 
Zo^n  is  approximately  a  zero  mean  normal  random  variable,  2knd  we  should  have 


^l-^o,n|  =  (-)'/'(VarZo.„)'/=' 


and  accordingly,  using  (3.14), 

mad{/^glj(i)}d®  =  (nhi  •  •  11^^  +  2^^  f{xo)^^^Si  ‘--Sd. 

(3.15) 
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Summing  over  all  cells 

j  mad{/ASH(®)}‘^®  =  ("^1  •  •  •  ?!  (^  ***  •/" 

The  reEisoning  above  was  only  suggestive,  and  several  details  must  be  carefully  checked 
in  order  to  establish  (3.16)  in  a  precise  fashion.  These  details  are  dealt  with  in  the  proof  of 
Theorem  2  in  the  Appendix. 


3.4.  lAB  for  the  ASH. 

We  already  have  the  expression  (3.7)  for  bias(z)  =  -E'/aSh(®)  ~  /(*)•  If  >  2  then 
(m?  —  5)5^5?  >  |(zj-  —  xo,j)^  when  x  eCq.  Hence 

/  |bias(x)|di<  /  |  J^/y(io)(®i  -  a:o.y)|da: 

JCo  Jco  ^ 


+  ^  ^  1  S  ht{xQ){xi  -  XQ^){XI  -  XQ,t)\dx 

+  \J  I  2  ht»  (®o) (®f  -  ®o,i) {xt  -  xo,t) (x,  -  xo,,) Ida 

jt^fS 

+0  ( si‘-‘Sd < y^i/j(xo)i^^j  si“'Sd 

\«=i  /  j=i 

+  S  l/jj(®o)l{(»n?  -  }^i  •••Sd 

J=i 

+  0(^1  -t - f  itt  (®o)|5l  •  • 

j,t,s 


Accordingly 
lAB 


/  \h\dx  +  '^^^S]{m)  -  1)  J  \fjj\dx +  ^^-^SjSt  j  \fjt\dx 


ignoring  higher  order  terms. 
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Theorem  2.  Let  /  have  continuous  partial  derivatives  to  the  fourth  order,  and  assume 
that  sdl  the  functions  \fj\,  |  |,  |  f  jt^t  !>  their  squares,  are  integrable.  Assume 

also  that  /  goes  to  zero  at  infinity  smoothly  enough  to  ensure  / {  =  /  fjjfudx  — 

—  J  fj  f  ju  dx.  Then  for  Scott’s  average  shifted  histogram  defined  in  (3.2)  and  (3.3),  as 
hi  —  miSi  0,  •  •  • ,  — *  0,  n^i '  •  •  — >■  oo, 

IMSE  =  (nhi  ■  ■  ■  ^^-'(1)“  n  1^1  +  ^  j  -  ^ 


Furthermore,  over  each  bounded  region  where  /  \fjt\/ f^l^dx  is  finite, 

IMAD  +  lAB  <  (nAi  •  •  •  ;i,)-i/2(i)i/2(^)<i/2  j  fU^dx 

1  hj  hi  /*  I  y  I  7  ^  •  •  •  i 

l/,<l<**+o  E ;st  +  E A, +  ^*777: 

■'  \  j=i  ■'  1=1 


hd 


4.  Generalized  Frequency  Polygons  of  Average  Shifted  Histograms 
The  average  shifted  histogram  is 

/aSH(®)  = 

*l 

for  each  x  G  Co  =  [a;o  —  xq  +  |5],  where  y(*ij  •  •  •  ,*<i)  “  ^(0  is  the  number  of  observations 
falling  in  Co(ti,--*,»(i)  =  Co(t)  =  (xq  +  (i  —  |)5,  xq  +  («  +  In  order  to  construct  the 

generalized  frequency  polygon  of  /^SH  should  interpolate  between  2^  neighbour  values  of 
/ash*  Consider  therefore  the  2^  ASH-cells  Co(ii, •  •  •  ,ii)  =  (xq  +  {j  -  |)5,  xq  +  (j  +  |)5], 
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(4.1) 


il>  ■  •  *  >i<i  €  {0>  1}>  shift  attention  to  the  inner  cell 

Co  =  (xo,  xo  +  5]  =  (a  -  ^5,  a  +  ^5], 

where  a  =  xq  +  is  the  centre  point  of  this  new  GFP-ASH  cell  and  at  the  same  time  in  the 
middle  of  2^  old  ASH-cells. 


Figure  2.  The  GFP-ASH  cell  lies  within  2^  ASH  cells.  /aSh(®)  “  defined,  for  each  of  these 
cells,  as  a  weighted  average  over  (2mi  —  1)  •  •  •  {2md  —  1)  ASH  cells.  This  is  illustrated  with 
mi  =  2  and  mj  =  3  above. 


Define,  then,  for  x  in  C^, 

/gFP-ASH(®)  =  S  Cji."  ,j,i(®)/aSh(®o,i  +  •  •  •  >  ®o,(i  +  jdSd) 

irn— 1  1 

=  ^  ^  ^i*l>'”i*d)Y{ji  +  ii,‘-',jd  +  id)/n 

Wil  iw<t 

=  Z  •••  T{^ir--,id)Y{eu--,td)/n,  (4.2) 

/i=l— mi  ^^=1— mj 
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where  w{i)  =  n^=i 


as  in  Section  3,  where 


Cj{x)  =  (1  -  Ui)^  •••(!-  Ud)^ 


(4.3) 


as  in  Section  2,  but  now  with 


Uy  Ut;(x)  (X|;  XQ^u)/ ^  ®v)/ ^  1  >  *  *  *  )  ^3  (4.4) 


and  where  finally 


with 


r(£)  =  T(£i,...,£d) 

EE.. 

=  Ti(£i)..-rd(£d), 


(4.5) 


r.(4)=  E  (i-M')(i_^)w.„(. 

1— f»0<i*<intr  — 1  ' 


''fn^  ^*13 
itf +j«— 6 


(4.6) 


—  h  1L  '\  f  1^1 \  11  f  i  1^  ~  />  —  1  m  m 

=  (1  —  UvJ  I  1 - 1  +  Uu  I  1 - )  ,  —  1  —  m„, •  •  •, m„. 

\  /  \  / 


4.1.  Bias  of  the  GFP-ASH. 


The  expectation  and  variance  of  /gFP-ASH  upon  the  cell  probabilities  p(£)  = 

^Co{t)  f  These  were  studied  in  (3.5),  but  it  is  now  more  advantageous  to  Taylor  expand  / 
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around  x  =  a: 


P W  =  [  {  /(o)  +  (“)  (®i  -  0«)  +  ^  X)  /•?  (“)  ) 

J{a+{t-l)S,a+eS]  K  ^ 


»=1 


+  6  S  -  a,)  +  •  •  •  Ua: 

^  •  f 

=  / (®)^i  •  •  •  /i(a)  / 

i=l  •/((^-1)«.«] 

+  5  ^ /  a;?  til  +  i  ^  /.y  (a)  f  XiXj 

=  f{cL)Sx  •  •  •  «d  +  X;  /*•(“) (^-  -  o )^*-  (^1  •  ” 

i=l 

+  E5*W(«(-4  +  ?)«?  («i---s4 

»=1 

*7^y 


dx 


(4.7) 


Some  exercises  in  algebra  yield  2<(A)  =  m,-,  E^=i-m.(^'  ”  |)^»(A)  =  («t  - 

E^Li-m,(^<  -  A-  +  l)Ti{ii)  =  |(m?  +  l)my.  All  this  leads  to 


-®/gFP-ASh(®)  =  ^  X)  2’(£i,---,£d)p(4,---,^) 


A 


A 

/id 


.  1 

/(a)mi  •  •  -  md  +  ^  /.•(a)(«t  -  r)5,-  mi  •  •  • 


t=i 


+  ^  +  1)^*  mi  •  •  -mrf 

i=l 

+  ^  (“)(«*■  “  ^)(“j  -  mi  •  •  •  md 

•  «¥y 

=  /(«) + X)  -  “i) + X^ 

i-l  «=1  ^ 

+  2  “  ^y)> 

*¥j 


using  (4.4)  and  remembering  Sj  =  hjfnijy  j  =  Subtracting  a  Taylor  approximation 
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for  f{x)  we  have  arrived  at 


bias(x)  =  -®/gFP-ASh(®)  “  /(®) 


We  could  actually  have  reached  (4,8)  more  directly,  perhaps  recycling  the  efforts  of  Section 
3  better,  cf.  (3.5)  and  (3.6),  but  the  representation  (4.2)  is  in  any  case  needed  in  Section  4.2 
below. 


From  (4.8)  one  gets 

^x^j}dx 


Summing  over  all  cells  and  approximating  integrals  with  Riemann  sums  as  usual,  and  using 
Sj  =  hj/rrijy  we  reach 


4.2.  Variance  of  the  GFP-ASH. 

/gFP-ASH(^)  ^  ^  linear  combination  of  /aSH(®o  values,  and  its  variemce  involves 
cov{/^Sh(^o  +  i^),  /aSH(®o  ^  is  most  convenient  to  use  the  repre- 


[  j  {hias{x)ydx^Y^fii{aY  f  +  ia:?}2dx 

J{a-^S.a+^S]  ^  2 

+ + 1)^/  - 


=  E  +  "».•  +  ^)(^i  •  •  • 


f=l 


+ K-  +  Ix”*? + 5)(«i  •  • 
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sentation  (4.2).  We  get,  for  x  G  (a-i5,  a  +  i5], 


Var{/QFp.AgIj(x)}  =  n  ^  ^  r(£)VW{l  -  pW} 


-  n-HAi  •  •  •  Aj)-*  X;  mT{t:)pit)p(if) 


n^f  pw 

mi  •  •  •  m^i  Si  •  •  ‘Sd 


(4.10) 


One  may  show  that 

c~l — tW,* 

=  {2m?  +  1  -  6u,(l  -  «i)}/3mi. 

Hence,  since  p(£)/(5i  •  •  -  5^)  =  /(a)  by  (4.7), 

'2  r  —  2m?  +  1  -  6ii,(l  -  Uj) 


f  V  w  fr: 

=  ^  “  l)/3m?}5,-  /(a) 


3m? 


dx  f{a) 


»=i 

=  (^)‘'/(a)«i---5d. 


(4.11) 

Also,  Z)fT(£)p(£)  =  mi---md  /(a)  Si---Sd,  so  that  {(hi •  • =  /(“)^-  1* 
follows  that 

[  Var{/GFp.j^SH(x)}dx=  (|)‘'(nhi---h<i)“^  -  ^  (4.12) 

J(a-^S,a+^S]  ^ 


4.3.  IMAP  of  the  GFP-ASH. 


From  (4.2) 

/gfp-ash(®)  ~  ^/gfp-ashC^')  =  (^i‘"^<i)~^^^W{“^W  -pW) 


=  (nhi...hd)-'/"X;^W 


r(£)-np(£)  [p(£){l-p(£)}]V2 

[np(£){l-p(£)}]i/2  (/n.../id)i/2 


=  (nhi-..hd)-'/2z(i). 


24 


say,  X  E  (a  —  a  +  |5],  where,  reasoning  as  in  2.4  and  3.3,  ^(x)  is  approximately  normed 
with  zero  meam  and  variance 

..’w  =  E  =  E 

hi'  ••  hfi  mi  •  •  •  md  5i  —  Sd 

d 

= /(“)  ^ 

1=1 

This  suggests  E\Z{x)  \  =  (f)^/^o’(a;)  and 

A  1  .  ,-®I/gfp-ash(®)--®/gfp-ash(®)I‘^® 

Jia-^S,  a+ij] 

=  (nAi . . .  n  T’^ +  1  "  6u.(l  -  u,)}/(3m?)]'/2dx.-  (4-13) 


—  (n/ii  •  ••hd)  •••Sd)  */(»tii)  •  • 

TT 

where 


(4.14) 


J[mi)  =  j  {2m?  +  1  —  6«(1  —  «)}^/^du/\/3m,- 

=  (i  +  1  log  3^/=»  +  (4m?  +  2)V=» 

^6^12mV  ^  6x/2mi  ®  (4m?  -  1)V2  ’ 

(J(mi)  is  amazingly  close  to  its  limit  value  (|)^/^  already  for  m<  =  3.)  Accordingly 

IMAD  =inhi---hd)~^{^Y/^  j  f^l^dxJ{m{)---J{md).  (4.15) 
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4.4.  lAB  of  the  GFP-ASH. 


From  (4.8)  it  is  clear  that 


|bias(a:)|rfa;  <  |/i,(a)|  /  ^  ^  ~  ^3:?}da 

1=1  ''(-3^1  3^1 


and  in  its  turn 

■AB  <  E  n*'  (*  +  2^)  / 1*!'*’’- 

Some  of  the  details  that  remain  in  order  to  actually  prove  the  following  theorem  are  available 
in  the  Appendix. 


Theorem  3.  For  the  generalized  frequency  polygon  of  the  average  shifted  histogram: 
IMSE  =  • .  •  hi)-^  “  ^ 

+ E  (i  ^  ^  °  (S  -  4  ' 

IMAD  +  lAB  ■<  (nhi  •  •  y  J(mi)  •  •  •  J(md) 


+ 


-fE§-E 


hi 


+ 


m*  4^  (nhi  •  ■  •  nhx  •  •  ’hd 


It  is  assumed  that  /  hzis  third  order  continuous  derivatives.  The  IMSE  expression  holds  when 
ifijk)^  have  finite  integrals.  The  IMAD  +  lAB  expression  holds  over  each 
bounded  region  where  /,  \fi\,  \  fij\,  \  fijj,  \,  \  'fij\/ all  have  finite  integrals. 


We  remark  that  the  IMSE  expression  obtained  here  is  better  than  the  one  obtained  (for 
d  =  2)  in  Scott  (1985b,  Theorem  4),  for  his  version  of  bivariate  frequency  polygons  of  average 
shifted  histograms. 
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5.  Discussion. 


We  have  obtained  results  for  IMSE  and  for  IMAD  +  LAB  for  natural  generalizations  of 
histograms.  Some  consequences  of  these  results  will  be  briefly  discussed  in  this  section  and 
comparisons  with  kernel  type  density  estimators  will  be  made. 

The  papers  of  Scott  (iQSSa,  1985b)  have  provided  the  inspiration  for  the  present  paper. 
We  have  been  able  to  improve  and  generalize  his  results  somewhat,  but  the  basic  statistical 
and  practical  issues  remain  the  same,  and  Scott’s  discussion  of  these  points  are  valid  also  for 
this  paper’s  density  estimators,  with  few  exceptions  and  minor  modiflcations.  The  reader  is 
therefore  referred  to  the  above  mentioned  articles  for  fuller  discussion. 


5.1.  Comparison  with  kernel  density  estimators. 


The  kernel  density  estimators  are  the  most  usual  alternatives  to  histograms.  They  are  of 
the  form 

t=l  ^  ^ 

with  the  kernel  K  a  function  on  IR^,  Usual  requirements  are  that  K  is  nonnegative  and  inte¬ 
grates  to  one,  and  that  it  is  symmetric;  •  •  • ,  Urf).—  K{e\U\^  •  •  • ,  for  all  ei,  •  •  • ,  G 

{—1,1}.  It  is  also  customary  to  have  h\  —  •••  =  ztnd  to  employ  product  kernels,  i.e. 
•  j  =  ^o(wi)  •  ‘  “  J5ro(«(i)  for  some  univariate  kernel  ifo- 

It  is  interesting  to  compare  the  results  of  Theorems  1,2,  and  3  with  corresponding  expres¬ 
sions  for  a  kernel  density  estimator.  U  K  is  nonnegative  with  integral  one,  and  /  UiK{yL)du  =  0, 
r?  =  J  ti?^r(u)du,  /  UiUjK{u)du  =  0  for  i  ^  j,  and  /  K{uYdu  is  flnite,  then  one  can  show 
that 

IMSE  =  J  K^du  (nhi .  f^dx 

.=1  i:!ij  •' 

IMAD  +  lAB  K^du)^/^  j  f^^^dx  {nhi  -  ■  +  j  \fii\dx.  (5.3) 

(5.2).  is  a  natural  generalization  of  the  clzissical  univariate  result  of  Parzen  (1962);  see  also 
Epanechnikov  (1969). 


(5.2) 
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(5.4) 


Let  us  also  write  down  the  results  for  the  ordinary  histogram  /o  of  (1.1): 

IMSE  =  (nhi . .  -  ha)-'  f^dx  +  2  j  {fi)^dx, 

IMAD  +  lAB  <(^)^/^y  /^/*dx(n/ii---/id)-^/2  +  ^ihi  J  \fi\dx.  (5.5) 
These  can  be  obtained  by  the  methods  of  Section  2. 

The  advantage  of  passing  from  /o  to  the  more  sophisticated  kernel  estimator  is  that  the 
bias  is  of  a  smaller  order.  The  rate  at  which  IMSE  for  the  histogram  goes  to  zero  when 
the  best  values  for  are  used  is  whereas  IMSE  of  (5.2)  can  attain  the 

rate  The  same  phenomenon  is  illustrated  using  the  IMAD  -h  LAB  criterion.  If 

hi  =  OinT^y  %  =  1,  •  •  • ,  d,  then  the  best  choice  for  a  in  the  histogram  case  (5.5)  czin  be  seen 
to  be  l/(d+  2),  and  IMAD  +  LAB  has  rate  For  the  kernel  estimator  case  (5.3) 

a  =  l/(d  +  4)  is  best,  giving  IMAD  +  LAB  a  rate  of 

It  is  clear  from  these  considerations  and  from  Theorems  1,  2,  and  3  that  both  the  GFP 
zuid  the  GFP- ASH  achieve  the  same  favourable  rate  as  the  kernel  estimator,  i.e. 
for  the  expected  L2  distance  and  for  the  upper  bound  for  the  L\  distance.  They 

therefore  offer  substantisd  (asymptotic)  improvement  over  the  ordinary  histogr2un.  The  ASH 
does  not  quite  achieve  the  same  kernel  estimator  rates,  but  for  the  finite  n  statisticians  are 
faced  with  the  constants  accompanying  and  determine  everything,  and 

it  is  evident  from  Theorem  2  that  the  ASH  produces  IMSE  and  IMAD  +  LAB  that  match 
those  of  the  kernel  estimator,  i.e,  (5.2)  and  (5.3),  even  for  moderate  values  of  m,*.  This  is  no 
coincidence;  Scott  (1985b)  observes  that  the  ASH  of  (3.3)  is  close  to  the  kernel  estimator  (5.1) 
with  iir(u)  =  n<=i(l  ”  ^  product  triangle  kernel.  Indeed  (5.2)  and  (5.3) 

result  if  we  let  the  mi’s  grow  to  infinity  in  the  expressions  of  Theorem  2. 

/ash  ^  ^  computationally  convenient  approximation  to  /*  of  (5.1) 

with  K  the  product  triangle  kernel.  This  points  to  the  possibility  of  approximating  other 
kernel  density  estimators  in  the  same  manner,  using  a  different  weighting  scheme  than  (3.4). 
Such  an  approximation  works  directly  with  the  binned  data,  and  the  computational  burden  is 
almost  independent  of  the  sample  size  n  of  the  raw  data.  In  an  example  Scott  (1985b)  reports 
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on  this  meant  reducing  CPU  time  from  hours  to  minutes. 


Scott  (1985b)  notes  that  even  small  to  moderate  values  of  the  are  effective  in  elimi¬ 
nating  the  portion  of  the  bias  that  stems  from  binning  the  data. 

Thus  and  relatives,  are  convenient  approximations  to  kernel  estimators,  and  are  as 

such  simileur  in  spirit  to  the  discrete  Fourier  inversion  method  introduced  in  Silverman  (1982) 
(see  also  Jones  and  Lotwick  (1984)).  This  technique  works  for  a  Gaussian  kernel.  It  is  unclear 
how  practical  that  method  is  for  high-dimensional  data. 

The  GFP-ASH  is  an  interpolated  version  of  a  kernel  estimator  approximation  for  binned 
data;  see  Scott  (1985b,  Section  6)  for  further  comments. 

Finally,  it  should  be  pointed  out  in  this  subsection  that  /o,  /gfP>  /aSH’  ^GFP-ASH 
in  fact  all  are  kernel  estimators,  but  with  complicated  kernels,  being  only  piecewise  continu¬ 
ous,  and  not  all  of  them  symmetric;  see  Walter  and  Blum  (1979)  and  Scott  (1985b,  Section  3). 
Another  way  of  obtaining  Theorems  1,  2,  and  3  would  conceivably  be  to  determine  these  under¬ 
lying  kernels  explicitly,  then  prove  precise  versions  of  (5.2),  (5.3),  but  for  piecewise  continuous 
and  nonsymmetric  kernels,  and  then  evaluate  the  appropriate  terms. 

5.2.  Choice  of  smoothing  parameters. 

It  is  natural  to  choose  smoothing  parameters  so  as  to  minimize  the  leading  terms  of  either 
the  IMSB  or  the  IMAD  +  LAB,  see  for  example  Freedman  and  Diaconis  (1981)  or  Scott  (1985a). 

Consider,  for  example,  the  generalized  frequency  polygon  of  Section  2,  and  let  us  for  the 
moment  adapt  the  Li  view  that  led  to  IMAD  +  LAB  as  a  natural  criterion.  The  leading  terms 
are  of  the  form 

d 

Ao(n/ii  •  •  •  ^  Aih} , 

1=1 

for  constants  Aq,  Ai,  •  •  • ,  determined  by  /.  Setting  partial  derivatives  equal  to  zero  one 
finds  that  the  best  choice  for  hi,  •  •  • , is 

hi  =  h*i  =  • . .  A,)i/2(‘^+4)Ari/2n-V(<i+4).  (5.6) 
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With  this  choice. 


IMAD  +  lAB  < 

Since  Aq  is  proportional  to  /  f^l^dx  and  A»  to  /  |/,»|dx,  it  emerges  that 

A(/)  =  {(/  f^'^dxY  j  i/uKa:.-.  J  (5.8) 

is  a  natural  measure  of  how  difficult  a  particular  /  is  to  estimate  using  a  generalized  frequency 
polygon. 

Exactly  the  same  reasoning  is  valid  for  the  GFP-ASH  and  for  the  kernel  estimator  /*.  In 
both  cases  hi  should  again  be  taken  proportional  to  and  again  A(/)  of  (5.8)  appears 

as  a  natural  measure  of  the  difficulty  with  which  /  can  be  estimated. 

Of  course  Aq,  Ai,  •  •  • ,  will  be  unknown,  and  a  natural  way  to  proceed  is  to  estimate 
the  needed  quantities  /  ‘  ‘  /  \hd\d^  beised  on  the  observed  data,  and  plug 

in  estimates  in  (5.6).  The  estimation  can  be  performed  nonparametrically,  using  perhaps  a 
separate  kernel  estimate  or  spline  type  estimate  of  /,  and  perhaps  with  sep2u*ately  determined 
smoothing  parameters,  for  this  purpose.  Another  possibility  is  to  fit  perhaps  a  rough  para¬ 
metric  model  to  the  data,  and  estimate  /  f^l^dx  etc.  using  parametric  techniques. 

To  get  a  possible  benchmark  for  the  choice  of  hi,  •  -  * ,  h^,  asume  for  the  moment  that  /  is 
iV^rf(/i,  S).  Clever  computations  give  /  =  and  /  |/i't|dx  = 

(27r)^/^.  The  leading  terms  for  IMAD  +  LAB  are  of  the  form  discussed  above  for  both  the  GFP 
and  the  GFP-ASH,  with  Aq  =  (^)^/^Bo  /  /^/^dx.  A,*  =  /  |/ii|dx.  One  arrives  at 

h.\  =  (|s|<r“  • .  .or‘*‘')i/2(<i+4)(^«yi/2„-i/{d+4)^  (5  9^ 


where 

For  the  GFP,  Bq  =  6^  and  Bi  =  where  6o  =  j  +  V^)-  cautious  recommen¬ 

dation  is  therefore  to  estimate  S  =  in  some  robust  way,  and  use 

^  j2<i/(<l+4)23d/2(d+4)^d/2(<i+4)gl/(<i+4)  . . .  ^<i<i)l/2(d+4)(^«y l/2„-l/(d+4)  (5  jq) 
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For  example,  h*  =  1.551<Tn“^/®  can  be  used  as  a  starting  vsJue  for  h  for  the  univariate 
frequency  polygon.  For  the  GFP-ASH,  on  the  other  hand,  Bq  —  J {mi)  ■  •  •  J [md)  and  Bi  = 
^  (l  +  2^)  •  ^  answer  for  the  C2ise  that  Bq  =  {\)^l^  and  Bi  =  ^  are  good 

enough  approximations  (moderate  m,’s  will  do).  Then 

h*i  =  2(5‘^-*)/2(‘'+4)3-(<i-2)/(d+4)^d/2(d+4)gl/(<i+4)  . .  .  <y<i<i)l/2(i+4)^^« j-l/2^-l/(<i+4) 

(5.11) 

In  the  one-dimensional  cjise  h*  =  1.829<7n“^/®. 

Similar  reasoning  can  be  used  for  the  EMSE  criterion.  The  typical  IMSE  has  leading  terms 
of  the  form 

Ao(nhi  •  -H 

see  Theorems  1  and  3  and  (5.2).  Put  hi  —  a,n~“,  so  that  the  IMSE  becomes  Ao(ai  •  •  -Od)"^ 
^-(i-<ia)  ^  "^ij  Aija^ajn~*°‘ .  The  best  choice  is  again  a  =  l/(<i-t-4),  giving 

IMSE  +  '  (5.12) 

ij 

remain  to  be  specified.  The  values  that  minimize  the  expression  in  the  brackets 
cannot  be  found  in  closed  form  solution  (for  d  >  3),  but  can  be  found  numerically  for  given 
vedues  of  Aq,  An,  •  •  • ,  Add-  This  requires  the  (first  stage)  estimation  of  the  unknown  quantities 
Hhydx,  j  JfiijfjjdXy  by  parametric  or  nonparametric  methods.  For  example,  if  /  is  Gaussian 
with  covariance  matrix  S,  then  /  fa  fjjdx  =  +  and  this 

may  be  used  to  get  at  least  starting  values  for  hi,  •  -  • ,  hd-  Comments  about  this  are  in  Scott 
(1985a);  here  we  shzdl  only  remark  that  this  procedure,  for  the  univariate  frequency  polygon, 
leads  to  h*  =  2.153£rn"^/®,  which  czin  be  compared  with  h*  =  1.551<7n“^/®  obtained  above 
with  the  Lx  view, 

Scott  (1985b)  also  discusses  other  methods  of  determining  the  smoothing  parameters. 

5.3.  Concluding  Remarks. 

The  previous  subsection  outlined  how  IMSE  and  EMAD  +  LAB  expressions  could  be  used 
to  provide  choices  for  smoothing  parameters,  i.e.  window  sizes  for  our  generalizations  of 
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histograms.  In  particular,  it  was  seen  that  the  natural  IMAD  +  LAB  criterion  gave  the  more 
explicit  recommendations,  and  also  led  to  the  reasonable  measure  (5.8)  of  how  difficult  the 
/  at  hand  is  to  estimate.  It  is  interesting  to  contrast  these  results  with  similar  ones  for  the 
ordinary  histogram  (1.1),  and  for  which  we  have  already  noted  (5.4)  and  (5.5).  Without  going 
into  the  details,  let  us  mark  down  that  the  best  choice  for  fti,  •  •  • ,  /id  is 

(5.13) 

based  on  the  IMAD  +  LAB  criterion,  and  that 

=  J \fi\dx--- j \fd\dx}^^^‘^''’^^  (5.14) 

emerges  as  the  reasonable  measure  of  difficulty.  If  /  is  Gaussian  with  diagonal 
covariance  matrix,  then 


If  the  IMSE  criterion  is  used  instead,  then 

hi  =  j  (y^)2^3.}l/2(<i+2)^  j  (/.)2da:}-l/2„-l/(<i+2) 

is  the  best  choice.  For  the  Gaussian  case 

«  3.50(lS|aii . .  .^<«)i/(2«i+4)(^«)-i/2„-i/(<i+2) 


(5.16) 


(5.17) 


Is  it  sensible  to  choose  window  sizes  and  smoothing  parameters  on  the  beisis  of  IMSE  and 
IMAD  +  lAB?  IMSE,  for  example,  is  really  the  expected  loss  ISE  =  /(/  —  f)^dx.  One  can 
show  that  ISE/IMSE  tends  to  one  in  probability,  for  all  the  estimators  considered  in  this  paper. 
This  lends  credibility  to  this  criterion,  and  a  similar  justification  can  be  given  for  IMAD  + 
LAB.  The  rate  at  which  ISE  becomes  close  to  IMSE  may  be  slow,  however;  for  example,  one 
czm  show  that  (ISE/IMSE  —  1)  heis  a  limiting  normal  distribution  in  the  histogram 

ceise,  and  that  (ISE/IMSE  -  1)  has  a  normal  limit  in  the  kernel  estimator  case. 

Let  us  point  out  that  it  makes  perfect  sense  to  use  for  example  the  IMAD  +  LAB  criterion 
over  some  specific  region  as  a  means  of  obtaining  smoothing  parameter  values;  the  reeisoning 
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of  5.2  can  equally  be  applied.  A  more  sophisticated  procedure  that,  however,  would  compli¬ 
cate  computational  matters  would  be  to  use  locally  varying  hi,  •  •  • ,  say.  These  could,  for 
exzonple,  be  specified  at  a  point  x  as  the  ones  that  give  minimum  IMSE  (or  estimated  IMSE) 
in  a  ball  of  some  fixed  radius  around  x. 

We  have  demonstrated  that  simple  and  computationally  efficient  variations  on  histograms 
can  match  for  example  kernel  density  estimators  in  performance.  It  is  probably  fair  to  point 
out,  however,  that  both  kernel  estimators  and  the  density  estimators  proposed  in  the  present 
paper  would  have  severe  diflficulties  in  being  “statistically  efficient”  for  zmything  but  well 
behaved  densities  in  higher  dimensions,  say  for  d  >  6;  they  would  require  enormous  sample 
sizes  to  detect  possibly  finer  and  interesting  structure.  One  might  turn  to  estimators  based  on 
projection  pursuit  methods,  for  example,  to  cope  with  such  problems,  see  Huber  (1985).  One 
can  hope,  however,  that  methods  like  the  GFP  and  the  ASH  czm  be  useful  as  building  blocks 
in  such  a  more  sophisticated  set-up.  Imagine,  for  exzunple,  that  a  “transformation  pursuit” 
method  was  put  to  work  on  some  six-dimensional  data,  and  ended  up  giving  a  transformation 
from  (Xi,*--,X6)  to  (Yi,---,l6)>  say,  having  the  property  that  (li,  12,^3)  and  (y4,y5,y6) 
become  prsictically  independent.  Then  the  GFP-ASH  could  be  used  to  estimate  the  densities 
of  (yi,  y2,  ys)  and  (y4,  ys.  Ye)?  in  a  computationally  eind  statistically  efficient  way.  Then  finally 
the  density  /  for  the  original  (Xi, •  •  'jXe)  is  obtained  by  inverse  transformation. 
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Appendix 


Behind  the  statements  displayed  in  Theorems  1,  2,  3  were  a  variety  of  Taylor  expansions 
and  approximations  of  integrals  to  Riemann  sums.  A  more  careful  study  of  the  remainder 
terms  involved  is  called  for  now  in  order  to  actually  prove  the  theorems. 


Sometimes  calculating  the  “next  term”  in  an  expansion,  assuming  the  needed  extra 
smoothness,  is  more  informative  than  a  proof,  consisting  as  it  must  of  bounding  various  re¬ 
mainder  terms.  This  is  done  in  a  couple  of  instances  below. 

We  shall  have  occasion  to  use  a  multivariate  version  of  what  Scott  (1985a,  p.  349)  calls 
the  generalized  mean-value  theorem.  Let  g  be  nonnegative  and  continuous  on  a  cell  [a,  6]  = 
If  ^  is  another  continuous  function  on  the  cell,  then 


(A.1) 


for  some  x*  in  [o,  6].  This  can  be  proved  as  follows:  It  is  trivially  true  if  =  0,  so 

assume  ffo(®)  =  fl(®)/ /[o,6]5  ^  density  on  [a,b].  Then  (A.1)  amounts  to  Eq  <p(X)  =  (p{x*) 

where  Eq  is  expectation  w.r.t.  go.  But  <p  carries  the  convex  set  riizziiotj  &t]  onto  an  interval, 
say  [c,  d],  and  Eq  ^(x)  must  be  somewhere  in  that  interval.  (A.1)  is  also  true  for  a  nonpositive 
g  but  not  necessarily  for  g  talcing  both  negative  &nd  positive  values.  (For  example,  x^  dx 
X*  X  dx.) 


A  second  fact  to  be  used  repeatedly  below  is  given  in  the  following  lemma. 


Lemma  A.1.  Assume  g  :  IR^  — ►  IR  and  its  first  order  partial  derivatives  are 

continuous  and  integrable.  Then 

•”hd  =  j g  dx  +  O  >  (-^-2) 

where  the  sum  is  of  over  all  cells,  the  union  of  which  is  IR^^  ezu:h  cell  has  volume  hi  •  •  •  and 
is  an  arbitrary  point  in  cell  number  k. 

An  explicit  and  generous  bound  is  available  if  the  mixed  higher  order  derivative  = 

{d^/dxi'  •  •dzd)  ^(x)  exists,  and  all  the  functions  gij{x)  =  {d^ /dxidxj)g{x),  i  <  j\  gijk{x)  = 
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(d^/dxidxjdxi;)g{x),  i  <  j  <  k]  •  •  •,  gi...d(x)  Me  integrable.  In  fact, 


1  -  /  S  clxl  hi  f  \gi\dx  +  hihj  f  \gij\dx 

k  .  i<j  •' 


(A.3) 


Proof:  Consider  a  cell  n<=i(<^>  ^«]  =  ^  arbitrary  point  ^  =  (^i,  •  •  • ,  in  it.  Then 


9{x)  -  g{^)  =  ^{^(^1, •  •  •  j  •  -  jiCd)  -  •  *  •,  Ci-i,  Ct, ®t+i,-  •  -  ,3:^)} 

»=i 

=  /  9ii^i>  •  •  •  I  f»-lj  «t)  a:,+i,  •  •  • ,  Xd)dui. 


Hence,  using  Fubini’s  theorem, 


\  J^g  dx-  g{^)hi>’’hd\  =  \  j^{g{x)  -  g{^)}dx\ 

<  /  /  |5t(6, '  •  • ,  Ci-i. “i. ®i+i,  •  •  •  a;<i)ldu,-  dx 

^  ^  r 

=  yii^i  -  «t)  /  >  Ci-I.  *».'••>  a;<i)ldx. 


Now  use  this  bound  for  each  cell: 


\  [  gdx-  YZ9i^k)hi  •  •  •  hdl  <  53  1  f  W  “  ff(6)}‘i3:l 
k  k 

^hifMdx  +  h,^!^^  152(6.1,  a;2,'”,a:d)lda 

H - ^^dYZ  I  l5d(6,l,'”,6,<i-l,®d)|dl. 


That  J2k  //*  |5.(6,i,  •  •  • ,  Xi,  ■■■,xd)\dx=  J  \gi\dx  +  €(hi,  •  •  • ,  hd),  where  e(hi,  ■■’,hd)-* 

0  as  max{hiy  •  •  • ,  hd}  — ►  0,  follows  from  continuity  and  Riemann  integrability  of  |ffi(x)l.  This 
proves  (A.2). 

Next  let  us  prove  (A.3)  under  the  stated  extra  assumptions.  The  proof  is  based  on  the 
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following  monstrous  algebrsuc  decomposition: 

9{.x)  -  5(^)  =  ^  /  gi{xu  Xd)dui 

-  y)  /  /  •  •  • ,  Uj,  •  •  • ,  U;,  •  •  • ,  Xd)dtXi  duy 

+  y)  /  /  /  gijk{xi,---,ui,...,uj,---,uk,'--,xd)dxxiduj  duk 

-I — /  •••/  ffi— <i(«i,  •  •  •  •  • 'duii. 

•/«! 


(A.4) 


For  example,  for  d  =  2  (A.4)  amounts  to 

*2)  -  s(fi>  ^2)  =  9{xx,  X2)  —  g{^i,  X2)  +  s(a:i,  X2)  -  g{xi,  ^2) 

-  *2)  -  s(xi,  6)  -  s(Ci,  X2)  +  g{^i,  6)}- 

Since  a  p-dimensional  integral  of  {d^/dyi  •  •  •  dyp)  9{yi,  •  •  • ,  J/p)  over  a  rectangle  can  be  written 
as  an  alternating  sum  of  the  2^  corner  values  of  q,  (A.4)  has  3*^—1  terms  on  the  right,  hand 
side.  (A.4)  may  .be  proved  by  carefully  keeping  track  of  the  number  of  pluses  and  minuses  in 
front  of  each  particular  term,  and  convince  oneself  that  everything  cancels  except  g{x)  —  g(^). 


Now  (A.3)  can  be  established.  Consider  the  particular  cell  I  first.  Then 

d  (•  I'bi 


,tx,-,-*-,x<i)ldu,-  dx 

+  y'//  f  Istj (*!>•• ',«»•)■•*, «y, •••, x<j) |du«  duy  dx 

r  rhd  rii 

+  •••+/  /  ■■’/  |ffi-..<i(ti)|dui •  • -du,!  dx 

J I J  Ja\ 

=  ^(*’«'  -  «t)  \9i\dx  +  ^(6,-  -  ay)(6y  -  ay)  \gij\dx 

+  •  •  •  +  (61  —  ai)  •  •  •  (6a  —  Od)  |5i...a|dx.  • 

(A.3)  follows  by  summing  over  all  cells.  | 


Remark  A.l.  The  lemma  provides  a  multidimensional  genersilization  of  results  reached 
by  Freedman  and  Diaconis  (1981),  cf.  their  Corollary  2.24. 

Still  another  Riemann  sum  lenuna  is  needed.  The  lemma  provides  a  strengthening  of  the 
previous  one  for  the  case  that  ^jb,  in  the  notation  of  (A.2)  and  (A.3),  is  the  centre  point  in  cell 
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k. 


Lemma  A.2.  Let  g  :  JR!^  —*  IR  have  continuous  and  integrable  partial  derivatives  gi,gii,gija- 
Let  (a*  —  |/i,  Uk  +  \h\  —  ^  +  |^i]  be  cell  number  A:  in  a  grid  of  cells  with 

volume  hi  •  •  •  hj.  Then 


^5(0*)^!  “-hd 
k 


gu  dx  +  0(hi  +  •  •  •  +  hj). 


(A.5) 


Proof:  Consider  a  single  cell  first,  and  omit  the  subscript  k  for  a  moment.  Then 


/  g{x)dx  -  g{a)hi  •••hd-  /  -  g{a)}dx 

J{a-^h,a+\h\  J a+^h] 

r  (  1 

^  ,  ,  ■  ,  ■{  -  «<)  +  9  ~ 

J(a-^h,a+lh]  [.^1  2  .. 

~  9  I,  I  ,  ,  I,  9i}{dx){xt  —  ~  aj)dx, 


where  we  used  the  generalized  mean-value  theorem  for  terms  in  the  first  sum.  This  is  also 
possible  for  the  second  sum,  but  only  after  circumventive  manoeuvring,  which  is  necessary 
since  (x»  —  ai)(xy  —  ay)  is  neither  nonnegative  nor  nonpositive  on  (a  —  \h^  a  +  |/i]. 


Divide  the  cell  into  four  regions,  Di,  -,  04,  where  Qi  has  (a,-  —  \hi^  a,*)  2md  (ay  —  |/iy,  ay) 
instead  of  (a,*  —  and  (ay  —  |/iy,  ay  +  |/iy],  where  Q2  similarly  has  (a,-  —  |/ii,  o^*) 

and  [ay,  ay  +  |hy],  Da  has  [a,-,  a,*  +  |h,]  and  (ay  —  |hy,  ay),  and  finally  CI4  has  [o^,  a*  +  ^hi] 
and  [ay,  ay  4-  |hy].  The  mean- value  theorem  can  be  employed  for  each  of  these  four  regions, 
and  gives 

/  “  ai)(icy  “  ay)dx 

=  '^hihj{<p{xi)  -  <p{x2)  -  (p{xz)  +  (p{x^)}hx  •  •  -  hd 
for  any  continuous  for  suitable  ii  in  ni,--,X4  in  Q4, 
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These  observations  lead  to 


Y]g{ak)hi--‘hd)-  gdx  =  '^  {g{x)  -  g{ai,)}dx 

*=1  Jb 

-  X^  X^'t5<i(“fc,»jM)  -  9{ak,ij,2)  -  5(2*, ti, 3)  +  s(5fc,tj,4)}^i  •■•hd, 

i<i  k 

where  2*,,-,  ajfc,,y,i,  ••,  ajfc,,y,4  all  lie  in  cell  number  k.  (A.5)  follows  upon  application  of  Lemma 
A.I. 


If  is  only  assumed  to  have  second  order  continuous  derivatives,  then  the  reasoning  above 
is  still  valid,  but  the  remainder  in  (A.5)  is  then  in  general  only  o{h\  H - h  h^).  | 

Proof  of  Theorem  1:  We  will  show 


/ {bias(®)}2da:  =  ^  [ ifu^dx 

*=1 


(A.6) 


j  Var{/(x)}dx  =  (|)‘'(n/ii  •■■hd)  +  O  (“^777^)  >  (^•'^) 

J  |bias(x)|di<  j  |/i,|dx  +  0  > 

J  mad{/(x)}dx  =  j  f^^^dx  {nhi  ■  ■■hd)~^^^ 


(A.8) 


(A.9) 


+  0((nhi  •  • ’hd)  ^  +  y^/t,(nhi  •  ■■hd)  ^^^), 

1=1 

which  clearly  suffices.  Some  of  the  error  estimates  can  be  improved,  see  Remark  A.2  below. 
Consider  the  Taylor  expansions  that  led  to  (2.7)  and  (2.15).  One  heis 


.  1  .. 

/(x)  =  /(a)  +  X)  /i(a)(®t-  -  Of)  +  r  X)  fie(a)(xi  -  ai)(xe  -  at) 


t=l 


i,t 


under  the  assumptions  of  the  theorem,  for  some  between  a  and  x.  Hence 
Po(j)  =  f  f{x)dx 


»=1 


1  ..  1  .  . 


i=l 


ijtt 


+ 


rr'o  Ji{j-i)h,ih] 


iA9 

by  (A.l),  where  ansj  is  somewhere  in  [a  +  {j  —  l)/ij  a  +  jfh].  A  more  complete  version  of  (2.7) 
and  (2.15)  is  accordingly 

d 


»=1 


bias(i)  =  -  «t)*}  +  5(®), 


(A.10) 


where 


^(®)  =  X^  ^  iiU  [  XiXiXt  dxl{hx  ■••hd) 


}i,-,3d  i,t,» 

1 


-  X)  fit,  (2*)(x,-  -  ai){pt  -  at){xs  -  a,). 


(A.11) 


i^ts 

Let  us  prove  (A.6).  Write  (A.  10  )  as 

bias(i)  =  b{x)  +  5(x)  =  6(x)  +  X^ 

i,t,» 

The  analysis  of  Section  2  implies  together  with  the  lemma  above  that 
d  r  -  /  d 


j  {b{x)}^dx = Y  I  jihydx + o  I 


»=i 

+ 


(A.12) 


(The  lemma  can  be  applied  since  {d/dxt)  fiifjj  =fiu  fjj  +  fa  absolutely  integrable: 

/I  fat  /yj|2x  <  oo.)  Next  consider  S(x)  =  Ei,e.t 
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One  has 


Siii{x)  =  X  {  "f^^i  (a*)(xi  -  aif  I  , 


so  that 

f  {««.(*)}’<  /  {E'=jW 

Jl{k)  <>o  lb  ^ 

+  /  fiii  (2*)^(a:i  -  Oifdx], 

Jl{k) 

Employing  the  generalized  mean  value  theorem  and  some  analysis  we  get  the  upper  bound 
fiii  ••'hd  +  u^^i  fiii  ■■■hd.  Summing  over  all  cells  we 

arrive  at  / =  0{h^).  Similar  analysis  for  the  other  terms,  combined  with  the  rough 
inequality  S{x)^  <  2*^*  ^i,e,s  gives  / {5(i)}^di  =  O  •  This  also  shows,  using 

/{6(x)Pdx  =  o(Eti/»|),that 


J  b{x)S{x)dx\  =  O  ^5^  j  . 


All  this  proves  (A.6). 

(A.8)  can  be  proved  in  a  similar  way;  there  are  in  fact  fewer  details  to  work  through,  and 
we  omit  them. 

Next  up  is  (A.7).  An  exact  expression  is 

Var{/(z)}  =  (nhi  •  •  -^<1)“^ 


- < 

n 


Po(j) 


hi'-'hd 


for  X  G  I{k)  =  (a  —  |h,  a  +  |h],  where 
PoU) 


t=l 


(A.13) 


with  diij,  Oit^j  being  somewhere  in  (a+(j  — l)h,  o+yh].  ‘It  is  not  difficult  to  get  Cj^x^dx  = 
and  cy(x)cyi(x)dx  =  for  J,/  e  {0,1}^ 
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from  (2.9)  and  (2.10).  It  follows  after  some  algebraic  efforts  that 


f  Yax  f{x)dx={nhx---hd)  ^{(|)V(a)Ai  •  • ' 

Jl{k)  I  J 

1=1  j 

j  ^ 

j,}'  3=1  •■ 

d  1  -1 

+  /(a)  E  +  (-1)^"+'}  + . . .  hi . . . A,. 

i=l 

This  implies,  using  the  Riemann  sum  lemmas,  that 

Var{/(x)}dx  =  (nhi  •  •  •  h^)-^  |  (|)‘'  +  O  h?^  | 

.ogv)), 

i.e.  (A.7)  is  true.  (A  minor  technicality  is  that  ay  . above  may  lie  outside  the  cell  I{k)]  it  is 
however  at  einy  rate  in  (a  —  A,  o  +  A] ,  and  a  version  of  the  Riemann  sum  lemma  can  be  stated 
and  proved  for  such  occasions.) 

Let  us  finally  prove  (A.9).  Let  Yo(^k',j]i)  be  the  indicator  of  the  event  that  Xi  falls  in 
.h{k’,j)  =  (a+  (j  -  1)A,  a  +  jA],  so  that  Yo{k-,j)  =  27=1  Yo{k-,j‘,i).  We  may  write 

/(i)  -  B/»  = 

V»£I 


where 


^n,t(®)  =  (Ai  •  •  •  hd)  Cy(x){yo(fc;  j]i)  -  Po(i)}. 


o-n(a;)^  =  Var{A„,,(x)} 


=  (Ai  •  •  •  Ad)  ^  cy  (x)2po(i){l  -  Po(i)}  -  E  Cj(®)cj'(®)Po(i)Po(j') 


(A.14) 


Ai-.-Ad 
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That 


Vn<rn(x)  ^  X 


<  B 


Pn{x) 

\/n(Tn{xY' 


where  Pn{x)  =  jS|A„,,(i)|^  and  B  is  a  universal  constant,  is  a  consequence  of  a  nice  Berry- 
Esseen  type  inequality  proved  in  Devroye  and  Gydrfi  (1985,  Lemma  8,  p.  90).  Hence 


E\Ux)  -  f{x)\  =  (nhi  •  •  +  e„(x)} 


(A.15) 


where  |€n(x)|  <  B  p„(x)/{V^cr„(x)2}. 


Consider  first  the  contribution  from  (Tn{x).  Combining  (A.13)  and  (A.14)  one  sees  that  the 
leading  terihs  in  a  Taylor  expansion  for  is  f{a)  Cj{x)^  +  Yi=i 

cy(x)*.  An  expansion  for  o’n(®)  accordingly  starts  out  eis 


1/2 


»=i 


-1/2 


(A.16) 


+ £  /(«)  I X) f  + 


This  implies,  after  having  properly  tended  to  reminder  terms,  that 

f  <Tn{x)dx  =  {^  +  log(l  +  V2)yf{ay^^hi  •••hd 
Jl(k)  2  2V2 


and  consequently 


<Tn(x)dx  =  {^  +  ^  log(l  +  v^)}''  J  f^'^dx  +  O  h,- j  (A.17) 

provided  /  l/t|//^/^dx  is  finite,  t  =  1, •••,«!. 

One  will  in  fact  usually  have  O  as  an  error  term  above.  This  is  because  the  sec¬ 
ond  term  of  (A.16)  integrates  to  zero  over  I{k)  and  because  /  f^f^dx  —  Yk  ’”hd  = 

Yi=i  /  f^fi^y^^dx  -f  O  (^f-i  hf^ .  The  error  term  of  (A.17)  can  be  shown  to  be 
O  provided  \  and  have  finite  integrals. 
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It  remains  to  consider  the  contribution  to  IMAD  from  en(x).  The  absolute  third  moment 
of  An^i(x)  can  be  bounded  as  follows: 

{1  -  ^Po(j)}l  -  5^C;(a:)po(i)|' 

J  J 

<  (hi”-  hd)~^l^  i  +  ^Po(j)  +  S  cy»(x)po(i') 

I  J  i  ]'  j' 

<  3(hi  •  •  •  Cj(x)  • 

(The  bound  can  be  somewhat  improved,  but  not  its  order  of  magnitude.)  By  the  bound  quoted 
after  (A.  15),  therefore,  • 

,  3B  E,c,(x)po(i)/(fti---M 

(nhi...h,)i/2  • 

Here  Y,jCj(x)pQ(j)/(hi---hd)  =  /(a)  +  0(Ei=i  ^f)  <^n(a:)^  =  /(«)  Ey  + 

0(Y^f_^  hi).  It  follows  that 


f  €n(x)dx\  < 

Jl(k) 


ZB 


L 


-1 


IK  i 


(nhi  -  --hd)^!^  Juk) 

=  (nhi  -  -•hd)~^^^  |3(— )‘*B  +  O  |  hi  •  •  -  hci. 

The  available  bound  on  ^^(x)  therefore  leads  (only)  to 

\  j  c„(x)dx|  <  3(^)‘*B  Volume(f2)(nhi  •  --hd)~^^^  +  O  hj/ (nhi  •  ”hdYl^  . 


Combining  this  bound  with  (A.15)  and  (A.17)  finally  proves  (A.9)  and  Theorem  1.  I 

Remark  A,2.  One  sometimes  gets  a  better  idea  of  the  typical  magnitude  of  error  terms 
if  the  "next  term”  of  the  expansion  in  question  is  computed,  assuming  extra  smoothness. 
Consider  for  ex2unple  the  EMSE  expression  for  the  GFP  obtained  in  Theorem  1.  One  may  go 
through  the  calculations  of  Section  2  once  more,  but  including  the  next  order  term  in  each 
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expansion.  For  example, 


bias(x)  =  -  J(xi  -  a,)^} 


i-1 

d 


t=l 


12' 

i' 


+  -  "«)  -  l(®<  -  -  a<)} 


is  a  more  complete  version  of  (2.15).  And  after  a  faiir  amount  of  algebraic  and  analytic  details 
one  arrives  at 


IMSE  =(f)‘'{l  +  Ei'‘*-  / 


1=1 

f^dx  -  2  \hi  [ (/,)' 

i=i 


t=l 

d 


+E 

»=i 


71 


241920 


jAf  I  (hi) 


iuYdx 


I 


distinct 


Proof  of  Theorem  2.  The  steps  to  be  talcen  Me  similar  to,  but  sometimes  more  involved, 
thzm  the  ones  displayed  in  the  proof  of  Theorem  1,  eind  most  of  the  details  will  be  omitted. 

One  C2in  give  a  longer  and  exact  expression  for  the  bias  than  the  Taylor  approximation 

(3.7) ,  using  2m  exact  version  of  (3.5).  Then  one  is  led  to  an  analogue  of  the  result  stated  before 

(3.8) .  A  key  observation  is  then  that 

UiYdx  +  O  Sf ^  , 

which  follows  by  Lemma  A.2,  since  =  2{/j/(x)}*  +  2fj{x)  f  ju  (x)  integrates  to 

zero.  Hence  the  error  term  corresponding  to  summing  up  Yyj=i  Si- terms  is 


k 
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^1^)  =  summing  up  /yy(xo)^^/  5i  •  •  -  5^  terms,  for  example,  is 

seen  to  lead  to  remainder  with  magnitude  0(Z)i=i  *”< )  =  /”*<),  and  similarly  for 

the  other  contributions,  so  0(Y'.f—i  hf/m,)  is  the  remainder  for  the  integrated  squEtred  bias. 

One  can  similarly  scrutinize  the  contributions  to  lAB,  using  Lemmas  A.l  and  A.2.  There 
are  remainders  of  size  0(Z)<Li  =  C>(Z)<=i  and  0(X)i=i  Sf  m?)  =  0(E,^=i 

and  the  latter  one  dominates. 


Next  consider  IVAR.  An  exact  version  of  (3.5)  based  on  Ein  exact  second  order  Taylor 
expansion  is 

d 


pin,  *<i)/(5i  ■  •  •  ^<j)  =  f{xo)  +  ^2 

i=l 

1  ..  1 
+  2  ^  /jj (*».«) (*i  +  ^ 


d 

Y. 

J=1  3<t 

where  x,*jy  and  Xi^jt  are  in  (xq  +  (i  -  xq  +  (*  + 1  “  rrij  <  ty  <  my  -1,  j  =  1,  •  •  • ,  d.  (We 
ignore  some  slight  complications  of  no  consequence  that  enter  the  situation  for  ^j<i  terms 
when  one  or  more  ij  is  zero;  then  (xy  —  a;o,y)(x^  —  xo,^)  is  neither  nonnegative  nor  nonpositive 
on  the  cell,  etnd  the  generalized  mean- value  theorem  must  be  applied  with  cutting  and  pasting. 
See  the  proof  of  Lemma  A.2.)  From  a  formula  in  Section  3.2,  therefore, 

Var{/ASHW}  =  (’•'■1  •  •  ■'‘ir‘{(5)'' n(l  +  i)/ W 


J=1 


+  §  E  «)■  E  +  ■  ■  ■} 

-  i{/(*o)  +  ^ E«i  E 4(®i, «)(■■?  + 


Now 


«?E4{S(.«)(-1  +  =  °(«i  “JlfeM)!). 


12  mi  •  •  •  trid 

where  |/yy(a:o)l  =  ^^^xe[xo-h,  xo-^h]  Even  though  x^  may  be  outside  of  Cq  =  (xq  - 

|5,  xo  +  |5],  one  may  show  that  the  sum  of  5?  over  all  cells,  is 

5?my{/  \  fjj\dx  +  0{hi  +  •  •  •  +  hd)})  using  techniques  as  in  Lemm2LS  A.l  and  A.2.  The  result 
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from  all  this  is  that 


/  Var{4sHW}‘i*=(»'‘i--W-‘{(5)'‘n  +  +°  (e*/ 


It  remains  only  to  take  care  of  IMAD.  One  has 

[  ™ad{/^g{j(i)}<fa:  =  (n^i  •  •  •  Sd)~^^^E\  ^  B„,a 

Jco  \/n 

by  (3.14),  writing 


*!»***»*<£ 


mi  •  •  •  rrid 


where  k)  =  Y{i;k)  is  an  indicator  for  the  event  that-X*  falls  in  C'o(i)  =  (®o  + 

(*  ~  |)^»  ®o  +  (*  +  1)5]  >  so  that  Y'(»)  =  Y (t;  k).  Using  Lemma  8  of  Devroye  and  Gyorfi 

(1985,  p.  90)  again, 

say,  where  =  Var  Bn,k  and  |e„|  <  B  Pn/{^/n<r^),  pn  =  B|Bn,jbl*.  Going  once  more  through 
eirguments  that  resemble  those  used  in  the  proof  of  Theorem  1,  the  result  is  that  the  contri¬ 
bution  to  IMAD  from  the  an  part  is 


{nhi  •  •  •  hi)  i 


provided  the  integrals  f  \fij\/ are  finite,  and  that  the  contribution  from  the  en  part  over 
some  bounded  region  R  is  less  than 

(n5i  •  •  •  S^~^I^ZB  •  •  •  rrid  (5i  •  •  •  5^)”^/^  Volume(i2) 


mT  •  •  •  m  j 
=  35-^ - Volume(i2). 

flAi  "  •  • 


This  proves  Theorem  2. 


Proof  of  Theorem  3:  We  are  in  a  position  to  use  techniques  displayed  in  the  proofs  of 
Theorems  1  and  2  and  omit  most  details. 
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One  has 


'®/gfp-ash(®)  = 

*  TYlt 


• »  ^d)  P(4, 


•  >  ^d) 
•Sd 


and  one  cein  write  down  an  exact  exprssion  for  p(£i,  •  •  •  >  ^d)/(5i  •  •  •  Sd)  based  on  an  exact  third 
order  Taylor  expansion  for  /.  (4.7)  displays  the  first  terms  in  the  expression  for  p(£i,  •  ,^d)/ 

{Si  -  •  •  Sd).  The  next  term,  i.e.  the  exact  remainder,  has  |  fm  (5m,«)(^f  —  +  ^  -  1)5^? 

for  some  am^t  in  (a + (^  — 1)5,  a+lS],  and  some  other  terms  of  the  same  magnitude.  Reasoning 
as  in  the  proof  of  Theorem  1,  one  has 


bias(i)  =  b{x)  +  5(a;)  =  b{x)  +  ^  5i,i(x), 

i,},k 

with  6(x)  =  Yl,i=i  /t.(a){n("*<  +  1)^/  “  K®*  ~  x  e  {a  -  y,  a+  |5],  and,  for  ex¬ 
ample,  5, •.-.(x)  =  |5?X)/  fm  “  |^<  +  ^  -  \)  for  ®  the  same  cell.  These 

facts  can  be  used  to  show  that  /  b{x)^dx  equals  the  right  hand  side  of  (4.9)  with  remainder 

o  (Eti  hfs,)  =  O  (Eti  A? M)  =  O  (Eiil  «?mj) .  Also,  /  =  O  (e?=i  <?">♦)  = 

o  (Eti  A»/m?),  aid  I  /Ka’ji'Cxjdxl  =  {O  (Eti  Aj)  O  (Et,  A?/m|)  }''"=  O  (Et.  A?/m,.).| 

Similar  2malysis  shows 

lAB  <  E  H**'  (*  +  2^)  /  °  (S  A?/-^)  • 


Next  up  is  IVAR.  One  has 


V«{/gFP-ASH  W>  =  ("Ai  •  •  •  A,)-‘  E 

_  f  P(^) 

fi  TTli  •  ■  •  TTld  Sx‘  •  •Sfi  * 


and  the  expansion  for  p{t)/{Si •••Sd)  mentioned  above  can  be  used  to  get 

{/GFP-ASH(=')}‘f®  =  ("^1  ■ 

S]  3 

+  ^0{S^m^fii{a))Si  ---Sd} 

«=i 

-  ^{/(a)^5i  ’-•Sd  +  ^^  0{Sfm1f{a)fii{a))Si  -  -  -  Sd}, 
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and  hence,  using  once  more  the  Riemann  sum  lemmas, 


IVAR  =  +  O  fdx  +  O  (^ki\}. 


It  remains  to  deal  with  IMAD.  One  has 


niad{/QFp.^SIj(x)}  =  (nhi •■•hd) 


from  Section  4.3,  where 


c„,»  w  =  (Ai  •  ■■k,r'l^  ■£  T{e){Y(t,  k)  -  p{l)}. 


Once  more 


where  Tnix)^  =  Var  Cn,k(x)  and  l»7n(®)|  <  B  £?lC„,i(i)|®/{y'n7Vi(x)^}.  Reasoning  as  at  previ¬ 
ous  occasions  one  may  show 


'/« 


for  each  bounded  region  i2,  with  the  const,  in  question  proportional  to  the  volume  of  iZ.  Also, 

"  mi  •  • -ma  5i  •  •  *0^  mi  •  - di  •  • -dd  J 

=  /(“)  n  — * — 53 - + E(<i  -  5)3:^ 


i=i  *”•?  (.1  y 


2  mi  •  ••rrid 


i-l 


3  mi  •  •  -mrf 

1,,.  m' 


2  ^  2  mi---m<i 


+0  . 


<»=1 
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.  so  that 


+  ^o(«?mKlAWI/W-‘'“  + /iWVW'"'’})- 


i=l 

It  follows  that 


f  rn{x)dx  =  f{a)^I^Si---Sd  J(mi)  •  •  •  J(mi) 

J{a-^S,a+^S] 


+  Eo(«'’”?{lA(“)l/W''’  +  /i(«)V(<.)-’'’})«i-  •  -s^ 


»=1, 


and  finally  that 


j  Tn{x)dx  =  J(mi)  •  •  •  J(md)  j  f^^^dx  +  O  , 


provided  |/«il//^^*  l/»/j|//^^^  have  finite  integrals. 

This  proves  Theorem  3.  | 
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